62 research outputs found

    Transferring Procedural Knowledge across Commonsense Tasks

    Full text link
    Stories about everyday situations are an essential part of human communication, motivating the need to develop AI agents that can reliably understand these stories. Despite the long list of supervised methods for story completion and procedural understanding, current AI has no mechanisms to automatically track and explain procedures in unseen stories. To bridge this gap, we study the ability of AI models to transfer procedural knowledge to novel narrative tasks in a transparent manner. We design LEAP: a comprehensive framework that integrates state-of-the-art modeling architectures, training regimes, and augmentation strategies based on both natural and synthetic stories. To address the lack of densely annotated training data, we devise a robust automatic labeler based on few-shot prompting to enhance the augmented data. Our experiments with in- and out-of-domain tasks reveal insights into the interplay of different architectures, training regimes, and augmentation strategies. LEAP's labeler has a clear positive impact on out-of-domain datasets, while the resulting dense annotation provides native explainability

    BRAINTEASER: Lateral Thinking Puzzles for Large Language Models

    Full text link
    The success of language models has inspired the NLP community to attend to tasks that require implicit and complex reasoning, relying on human-like commonsense mechanisms. While such vertical thinking tasks have been relatively popular, lateral thinking puzzles have received little attention. To bridge this gap, we devise BRAINTEASER: a multiple-choice Question Answering task designed to test the model's ability to exhibit lateral thinking and defy default commonsense associations. We design a three-step procedure for creating the first lateral thinking benchmark, consisting of data collection, distractor generation, and generation of adversarial examples, leading to 1,100 puzzles with high-quality annotations. To assess the consistency of lateral reasoning by models, we enrich BRAINTEASER based on a semantic and contextual reconstruction of its questions. Our experiments with state-of-the-art instruction- and commonsense language models reveal a significant gap between human and model performance, which is further widened when consistency across adversarial formats is considered. We make all of our code and data available to stimulate work on developing and evaluating lateral thinking models

    Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models

    Full text link
    Retrieval-augmented language models (RALMs) represent a substantial advancement in the capabilities of large language models, notably in reducing factual hallucination by leveraging external knowledge sources. However, the reliability of the retrieved information is not always guaranteed. The retrieval of irrelevant data can lead to misguided responses, and potentially causing the model to overlook its inherent knowledge, even when it possesses adequate information to address the query. Moreover, standard RALMs often struggle to assess whether they possess adequate knowledge, both intrinsic and retrieved, to provide an accurate answer. In situations where knowledge is lacking, these systems should ideally respond with "unknown" when the answer is unattainable. In response to these challenges, we introduces Chain-of-Noting (CoN), a novel approach aimed at improving the robustness of RALMs in facing noisy, irrelevant documents and in handling unknown scenarios. The core idea of CoN is to generate sequential reading notes for retrieved documents, enabling a thorough evaluation of their relevance to the given question and integrating this information to formulate the final answer. We employed ChatGPT to create training data for CoN, which was subsequently trained on an LLaMa-2 7B model. Our experiments across four open-domain QA benchmarks show that RALMs equipped with CoN significantly outperform standard RALMs. Notably, CoN achieves an average improvement of +7.9 in EM score given entirely noisy retrieved documents and +10.5 in rejection rates for real-time questions that fall outside the pre-training knowledge scope.Comment: Preprin

    Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering

    Full text link
    Recent developments in pre-trained neural language modeling have led to leaps in accuracy on commonsense question-answering benchmarks. However, there is increasing concern that models overfit to specific tasks, without learning to utilize external knowledge or perform general semantic reasoning. In contrast, zero-shot evaluations have shown promise as a more robust measure of a model's general reasoning abilities. In this paper, we propose a novel neuro-symbolic framework for zero-shot question answering across commonsense tasks. Guided by a set of hypotheses, the framework studies how to transform various pre-existing knowledge resources into a form that is most effective for pre-training models. We vary the set of language models, training regimes, knowledge sources, and data generation strategies, and measure their impact across tasks. Extending on prior work, we devise and compare four constrained distractor-sampling strategies. We provide empirical results across five commonsense question-answering tasks with data generated from five external knowledge resources. We show that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks. In addition, both preserving the structure of the task as well as generating fair and informative questions help language models learn more effectively.Comment: AAAI 202

    Effect of Dextrose Equivalent on Maltodextrin/Whey Protein Spray-Dried Powder Microcapsules and Dynamic Release of Loaded Flavor during Storage and Powder Rehydration

    Get PDF
    peer reviewedThe preparation of powdered microcapsules of flavor substances should not only protect these substances from volatilization during storage but also improve their di usion during use. This study aimed to investigate the e ects of maltodextrin (MD) with di erent dextrose equivalent (DE) values on retention of flavor substances during storage, and the dynamic release of flavor substances during dissolution. MDs with three di erent DE values and whey protein isolate were mixed in a ratio of 4:1 as wall materials to encapsulate ethyl acetate, and powdered microcapsules were prepared by spray drying. It was proved that MD could reduce the di usion of flavor substances under di erent relative humidity conditions through the interaction between core material and wall material. During dissolution, MD released flavor substances quickly owing to its superior solubility. The reconstituted emulsion formed after the powder dissolved in water recaptured flavor substances and made the system reach equilibrium. This study explored the mechanism of flavor release during the storage and dissolution of powder microcapsules and should help us understand the application of powder microcapsules in food systems
    • …
    corecore